Quantcast
Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
    1. Welcome to GTAForums!   (84,934 visits to this link)

    2. News

    1. GTA Online

      1. Find Lobbies & Players
      2. Guides & Strategies
      3. Vehicles
      4. Content Creator
      5. Help & Support
    2. Crews

      1. Events
      2. Recruitment
    1. Grand Theft Auto Series

    2. GTA Next

    3. GTA V

      1. PC
      2. Guides & Strategies
      3. Help & Support
    4. GTA IV

      1. Episodes from Liberty City
      2. Multiplayer
      3. Guides & Strategies
      4. Help & Support
      5. GTA Mods
    5. GTA Chinatown Wars

    6. GTA Vice City Stories

    7. GTA Liberty City Stories

    8. GTA San Andreas

      1. Guides & Strategies
      2. Help & Support
      3. GTA Mods
    9. GTA Vice City

      1. Guides & Strategies
      2. Help & Support
      3. GTA Mods
    10. GTA III

      1. Guides & Strategies
      2. Help & Support
      3. GTA Mods
    11. Top Down Games

      1. GTA Advance
      2. GTA 2
      3. GTA
    12. Wiki

      1. Merchandising
    1. GTA Modding

      1. GTA V
      2. GTA IV
      3. GTA III, VC & SA
      4. Tutorials
    2. Mod Showroom

      1. Scripts & Plugins
      2. Maps
      3. Total Conversions
      4. Vehicles
      5. Textures
      6. Characters
      7. Tools
      8. Other
      9. Workshop
    3. Featured Mods

      1. DYOM
      2. OpenIV
      3. GTA: Underground
      4. GTA: Liberty City
      5. GTA: State of Liberty
    1. Red Dead Redemption 2

    2. Red Dead Redemption

    3. Rockstar Games

    1. Off-Topic

      1. General Chat
      2. Gaming
      3. Technology
      4. Programming
      5. Movies & TV
      6. Music
      7. Sports
      8. Vehicles
    2. Expression

      1. Graphics / Visual Arts
      2. GFX Requests & Tutorials
      3. Writers' Discussion
      4. Debates & Discussion
    1. Forum Support

    2. Site Suggestions

David.

Making an HTML parser for a website

Recommended Posts

David.

Hey guys,

 

Let's say I'm trying to parse data from a website that has no JSON, RSS feeds. For instance, this forum website. If I were to make an app for this website would it be wise to parse raw HTML or is there a better way?

 

 

Thanks!

Share this post


Link to post
Share on other sites
OzzySM12

It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Share this post


Link to post
Share on other sites
fastman92

If you want to parse HTML document and read neccessary informations, then there are few decent HTML parsers.

For C# i have used Html Agility Pack.

 

Using RegEx is not good way to extract informations from HTML document, HTML document may be poorly written.

You can't write a complete HTML parser, unless you have solid programming skills.

Share this post


Link to post
Share on other sites
eggburt

What language are you using? The best ones I know of are

 

(Python) Beautiful Soup

(PHP) PHP Dom

(Javascript) jQuery

 

If you clarify what language you're after I'd be happy to provide a few more hints :-)

Share this post


Link to post
Share on other sites
David.

 

It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon.

 

@fastman92

 

since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good.

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

Share this post


Link to post
Share on other sites
fastman92
It will depend what you can get from the site.

 

Does the site have a print version or a basic/mobile template you can request? If you can get those then it will send more of the data you need and less of the added sh*t.

 

Also try asking the folk to set up an RSS feed. I'd imagine they would if it is of benefit to them.

Well, for GTAforums.com there is no mobile version and I don't think we'll have an RSS feed setup anytime soon.

 

@fastman92

 

since I'm coding in Java, I've found that TagSoup and HTML Cleaner are pretty good.

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

In case that server doesn't provide any data except pages, then it's necessary to parse HTML pages and get informations.

You can find information by parsing a HTML with library and extracting them by selecting item by certain ID/class/name/tag or different way.

Share this post


Link to post
Share on other sites
eggburt

 

I guess what I'm asking is this a really good idea? Just parsing the raw HTML documents to make a mobile app?

Purely from a support point of view I'd try to find another option. Generally speaking if the site changes their layout in any way your app will break.

 

What site is it you're looking to app-ify? If it was indeed GTAF your best bet would be the printable version of each page.

 

For example this topic is

http://www.gtaforums.com/index.php?showtopic=558931

 

You'd take the showtopic value and use it for the t value here

http://www.gtaforums.com/index.php?act=Pri...rinter&t=558931

 

Which is more likely to remain the same layout for a long time.

 

 

Something else to watch out for however is that in this case the printable page shows the entire topic. Your app may not last too long if it becomes popular due to the amount of load it could put the site under - and depending what the site's hosting plan is how much bandwidth it chomps down.

 

I'd definitely recommend seeking permission from the site first, they may even have a better option for you to use you may not know about

 

 

Share this post


Link to post
Share on other sites
David.

Thanks for all the help guys!

 

 

I guess I'll try asking Mr. Tank in a mail or something if there are any additional options. Otherwise I'll just go ahead with the printable version like eggburt suggested.

 

I'm rather new to Android development, and I thought it would be a fun project to try out-making an app for GTAF.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

By using GTAForums.com, you agree to our Terms of Use and Privacy Policy.